Abstract
Introduction
The biological heterogeneity of human leukocyte antigen (HLA) molecules, including differences in B-leader peptides and peptide-binding motifs, has been increasingly recognized as a critical determinant of posttransplant immune responses. Subclassification of mismatched HLA alleles based on these molecular features has been reported to predict outcomes of mismatched allogeneic hematopoietic cell transplantation (allo-HCT). Protein language models (pLMs), which adopt architectures similar to large language models, can embed protein sequences into latent representations that are thought to capture their biochemical and structural properties. In this study, we aimed to identify HLA features associated with the development of acute graft-versus-host disease (aGVHD) in HLA-mismatched allo-HCT by leveraging latent representations generated by a pLM.
Methods
We retrospectively analyzed the clinical outcomes of patients (age ≥16 years) with hematological malignancies who underwent allo-HCT from unrelated donors in Japan between 2000 and 2022, with a one-allele mismatch (7/8 match) in the graft-versus-host direction at the HLA-A, -B, -C, and -DRB1 loci. Patient data were obtained from the Japanese Society for Transplantation and Cellular Therapy and the Japanese Data Center for Hematopoietic Cell Transplantation, using the Transplant Registry Unified Management Program. This retrospective study was designed in accordance with the Declaration of Helsinki and approved by the Ethical Committee of Osaka Metropolitan University Graduate School of Medicine (Osaka, Japan) (identification number 2024-167). HLA sequences were embedded using the Evolutionary Scale Modeling 2 (ESM2-15B) model, which is one of the largest pLMs. The dataset was randomly split into a training set (80%) and test set (20%). To predict the incidence of aGVHD, we constructed an attention mechanism-based neural network architecture that compressed HLA embeddings from each donor-recipient pair into one-dimensional continuous variables. We employed the nnet survival framework, a deep learning-based survival analysis model, using the incidence of grade II–IV acute GVHD as the outcome label. This model incorporates both HLA embeddings and clinical covariates as the inputs. The model performance was evaluated using time-dependent receiver operating characteristic (ROC) curves. For model interpretability, the SHAP values and attention maps were analyzed. Additionally, the clinical significance of HLA latent features was evaluated using cause-specific analysis with a Cox proportional hazards model.
Results
In total, 6,084 patients were included in the analysis, with 4,867 used for model training and 1,217 for testing. The area under the ROC curve (AUC) for predicting grade II–IV and grade III–IV aGVHD on day 30 was 0.604 and 0.622, respectively. On day 100, the AUC was 0.538 for grades II–IV aGVHD and 0.545 for grades III–IV aGVHD. The attention map consistently highlights the HLA peptide-binding pocket region, suggesting its biological relevance. According to SHAP values, HLA embedding was ranked as the fourth most important predictor among all 30 covariates, following conditioning intensity, presence or absence of ATG, and whether the disease was AML. In the cause-specific Cox proportional hazards model adjusted for established risk factors for aGVHD, the one-dimensional paired HLA embedding feature was not significantly associated with the incidence of grade II–IV aGVHD (hazard ratio [HR], 0.74; 95% confidence interval [CI]: 0.46–1.22, p = 0.24). In contrast, a significant association was observed for grade III–IV aGVHD (HR, 2.47; 95% CI: 1.01–6.03, p = 0.048).
Conclusion
Latent HLA representations derived from pLM successfully contributed to the prediction of severe aGVHD in HLA-mismatched allo-HCT. Attention-based interpretability suggests that the diversity within the HLA peptide-binding pocket may play a pivotal role in the immunopathogenesis of aGVHD. This approach offers a novel data-driven strategy for evaluating the immunogenetic risk of allo-HCT.
This feature is available to Subscribers Only
Sign In or Create an Account Close Modal